Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation
نویسندگان
چکیده
Automatic semantic annotation of structured data enables unsupervised integration of data from heterogeneous sources but is difficult to perform accurately due to the presence of many numeric fields and proper-noun fields that do not allow reference-based approaches and the absence of natural language text that prevents the use of language-based approaches. In addition, several of these semantic types have multiple heterogeneous representations, while sharing syntactic structure with other types. In this work, we propose a new approach to use conditional random fields (CRFs) to perform semantic annotation of structured data that takes advantage of the structure and labels of the tokens for higher accuracy of field labeling, while still allowing the use of exact inference techniques. We compare our approach with a linear-CRF based model that only labels fields and also with a regular-expression based approach.
منابع مشابه
Exploiting Structure within Data for Accurate Labeling using Conditional Random Fields
Automatically assigning semantic class labels such as WindSpeed, Flight Number and Address to data obtained from structured sources including databases or web pages is an important problem in data integration since it enables the researchers to identify the contents of these sources. Automatic semantic annotation is difficult because of the variety of formats used for each semantic type (e.g., ...
متن کاملXML Document Transformation with Conditional Random Fields
We address the problem of structure mapping that arises in xml data exchange or xml document transformation. Our approach relies on xml annotation with semantic labels that describe local tree editions. We propose xml Conditional Random Fields (xcrfs), a framework for building conditional models for labeling xml documents. We equip xcrfs with efficient algorithms for inference and parameter est...
متن کاملSentence and Token Splitting Based On Conditional Random Fields
Natural language processing systems which deal with real-world documents require several low-level tasks such as splitting a text into its constituent sentences, and splitting each sentence into its constituent tokens. These basic text segmentation services are usually supplied by some preprocessor prior to linguistic analysis. While this task is often considered as unsophisticated clerical wor...
متن کاملToward the automatic extraction of knowledge of usable goods
Knowledge of usable goods (e.g., toothbrush is used to clean the teeth and treadmill is used for exercise) is ubiquitous and in constant demand. This study proposes semantic labels to capture aspects of knowledge of usable goods and builds a benchmark corpus, Usable Goods Corpus, to explore this new semantic labeling task. Our human annotation experiment shows that human annotators can generall...
متن کاملWord Co-occurrence and Markov Random Fields for Improving Automatic Image Annotation
In this paper a novel approach for improving automatic image annotation methods is proposed. The approach is based on the fact that accuracy of current image annotation methods is low if we look at the most confident label only. Instead, accuracy is improved if we look for the correct label within the set of the top−k candidate labels. We take advantage of this fact and propose a Markov random ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011